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Abstract 

Background: It is well known that DNA methylation, as an epigenetic factor, has an important effect on gene 
expression and disease development. Detecting differentially methylated loci under different conditions, such as 
cancer types or treatments, is of great interest in current research as it is important in cancer diagnosis and 
classification. However, inappropriate testing approaches can result in large false positives and/or false negatives. 
Appropriate and powerful statistical methods are desirable but very limited in the literature. 

Results: In this paper, we propose a nonparametric method to detect differentially methylated loci under multiple 
conditions for lllumina Array Methylation data. We compare the new method with other methods using simulated 
and real data. Our study shows that the proposed one outperforms other methods considered in this paper. 

Conclusions: Due to the unique feature of the lllumina Array Methylation data, commonly used statistical tests will 
lose power or give misleading results. Therefore, appropriate statistical methods are crucial for this type of data. 
Powerful statistical approaches remain to be developed. 

Availability: R codes are available upon request. 



Background 

It is well known that DNA methylation has important 
effects on transcriptional regulation, chromosomal stabi- 
lity, genomic imprinting, and X-inactivation [1,2]. It has 
been also shown to be associated with many human dis- 
eases, such as various types of cancer [3-11]. 

With the advances of BeadArray technology, genome- 
wide high-throughput methylation data can be easily gen- 
erated by lllumina GoldenGate and Infinium Methylation 
Assays. After preprocessing steps, such as background 
correction and normalization, are applied to the raw 
fluorescent intensities, for each locus, from about 30 
replicates in the same array a summarized /J-value is 
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max {M, 0} 



generated as follows: , 

* max{M,0} + max{L/,0} + 100 

where M is the average signal from a methylated allele 
while U is that from unmethylated allele. The fi -values 
are continuous numbers between 0 and 1, with 0 stands 
for totally unmethylated and 1 for completely methylated. 

It has been shown that the fi -value is rarely normally 
distributed [9,12,13]. Therefore the commonly used t-test 
for case control designs or ANOVA for multiple condi- 
tions are not the most powerful approaches when detect- 
ing differentially methylated loci. Observing this, Wang 
has proposed a model-based likelihood ratio test to detect 
differentially methylated loci for case and control data 
under the assumption that the P -value follows a three- 
component normal-uniform distribution [9]. Wang 
showed that for some situations, their proposed test was 
better than the simple t-test based on simulation studies. 
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However, in their method, Wang did not consider the 
effect of age, which has been shown highly associated 
with methylation [14,15]. Noticing the importance of 
age effect, one may use a linear regression with age 
included as a covariate when analyze methylation data 
with multiple conditions, such as cancer types. However, 
the underlying assumption of equal variances may not 
be satisfied [12]. Therefore the commonly used linear 
regression method may not be appropriate. 

In this paper, we consider methylation data with mul- 
tiple conditions and propose a nonparametric method 
which incorporates the age effect in a way through the 
idea of combining p-values from independent tests 
[12,16,17]. More specifically, we first group subjects into 
several age groups based on their age; then for each age 
group, a nonparametric Kruskal-Wallis test is conducted 
for the given locus and the p-value is recorded. An over- 
all p-value for that locus will be estimated through com- 
bining the p-values from all age groups. Using a real 
methylation data with three conditions and a simulation 
study, we show that the proposed test is more powerful 
than other methods, including linear regression. 

Method 

Proposed method 

Assume there are K conditions and G age groups. For 
each age group g (g = 1,2,...,G), we apply the nonpara- 
metric Kruskal-Wallis test and obtain a p-value p^^, 
then the overall p-value can be estimated by Fisher test 
[18]: 

PKW = Xlf._2cix^ > -2 E log(p,^^)) (1) 
Combined ANOVA test 

Similarly, we can use ANOVA to replace KW test for 
each age group and obtain an overall p-value with 
being replaced by the p-value p^'^o^^ from ANOVA test: 

PANOVA = X^f-_2GU^ > -2 E log(pf O^'^)) (2) 

Combined median test 

Another nonparametric test is median test using the fol- 
lowing statistic for each age group: 

(^fe ~ "fe/2) ^ ^here A,^ is the number of 

times that the ranks of individual observations from 
group k which excess the median from the pooled data, 
and ni( is the sample size of group k. When the sample 
sizes are large, under the null hypothesis that all 
samples have the same median, the statistic M has a 



chi-square distribution with K-1 degrees of freedom. 
The overall p-value from the combined median test can 
be calculated: 

pMedian = xLjG (X ' > "2 E logCp^"*"")) (3) 

Combined welch test 

We also consider the nonparametric Welch test. For 
each age group, we have the test statistic [19]: 

E wkih - pifl{K - 1) 
w=^ —' 

\ + [2[K-2)l[K^-2)\j:K 

K K 

where wk = tik/sl, il= J2 mh/m w= hk = [i - wk/w^yinu - i). 
Under the null hypothesis, the statistic W is asympto- 
tically distributed as F-distribution with K-1 and 

K 

f = (K^ - 1)/(3 E ^k) degrees of freedom. Welch test is 

fe=i 

an improvement of the Cochran test [20] which usually 
has inflated type I error rate especially for small sample 
sizes [19,21]. The overall p-value from the combined 
Welch test is: 

Pwelch = X!f.2dx' > -2 E logCpf'^")) (4) 

Methods for combining p-values 

Besides the Fisher method mentioned above, we also 
consider Z-test to combine p-values from indepen- 
dent tests. First we calculated the weighted Z statistic 
using individual p-values from each age group: 

G G 

Z = E "g'l'^Hl - Px)/ E where rig is the total sam- 

pie size in age group g and O is the cumulative distribu- 
tion function (CDF) of the standard normal distribution. 
It is easy to see that this statistic has standard normal 
distribution under the null hypothesis. The overall 
p-value is calculated by 1- <I'(Z). Note that here we use 
one-sided test to obtain the overall p-value. 

Simulation settings 

To compare each method applied to an individual age 
group, we simulate P -value for three treatment groups 
based on beta distribution with parameters a and b, beta 
{a,b), and truncated normal distribution on (0,1) with 
parameters |i, o^, TN(n, o^). We assume the sample sizes 
(denoted as 5 in Tables 1, 2 for the simulation results) for 
the three treatments are either balanced: s = 30 for each, 
or non-balanced: s = 20, 30, and 40. First we compare the 
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Table 1 Estimated type I error rates at significance level 0.05 with 10000 replicates. 



DictriKiitinn fc^mnlp ci7pc nAmmptprc) 


ANOVA 


mpHi;in 

1 1 ICUICII 1 


Welch 


KW 


beid {s — ou,ou,ou, 0 — 1,1,1, 0 — z,z,z) 


U.U40 


U.U4U 


U.Udz 


U.U4/ 


dG13 [S — DUpUpU, 0 — 1,1,1, 0 — lU, lU, lUJ 


U.Ujz 


U.U44 


U.Ujj 


U.Uj 1 


beta [5 = JU,iU,oU, a = lu, lu, lu, 0 = \,\,\) 


0.047 


0.044 


0.052 


0.048 


Beta (s = do.io.io, a = 10,10,10, o = 10,10,10) 


0.045 


0.045 


0.047 


0.046 


Beta (5 = 20,30,40, 0 = 1,1,1, b = 2,2,2) 


0.053 


0.052 


0.050 


0.053 


Beta (s = 20,30,40, a = 1,1,1, 6 = 10,10,10) 


0.049 


0.049 


0.054 


0.048 


Beta (s = 20,30,40, a = 10,10,10, b = 1,1,1) 


0.045 


0.049 


0.056 


0.044 


Beta {s = 20,30,40, a = 10,10,10, b = 10,10,10) 


0.050 


0.051 


0.043 


0.052 


TN (s = 30,30,30, n = 0.5,0.5, 0.5, = 0.1,0.1,0.1) 


0.050 


0.044 


0.053 


0.045 


TN(s = 30,30,30, \i = 0.5, 0.5, 0.5, ct^ = 0.1,0.2,0.3) 


0.053 


0.067 


0.047 


0.053 


TN (s = 20,30,40, = 0.5, 0.5, 0.5, = 0.1,0.1,0.1) 


0.050 


0.052 


0.052 


0.049 


TN(s = 20,30,40, |i = 0.5, 0.5, 0.5, = 0.1,0.2,0.3) 


0.047 


0.054 


0.051 


0.043 



estimated type I error rates with the given significance 
level of 0.05 under the null hypothesis of no differences 
among treatment groups. Then we compare the empirical 
powers from each method under various situations. The 
empirical power is the proportion of rejected null hypoth- 
esis to the number of replicates. 

A real data set 

We will use a real methylation data set, the United King- 
dom Ovarian Cancer Population Study (UKOPS) [15] with 
274 controls, 131 pre-treatment cases, and 135 post treat- 
ment cases, to compare the performance of the proposed 
test with others. Those methylation data were generated 
by the lUumina Infinium Huamn Methylaytion27 Bead- 
Chip and can be downloaded under accession number 
GSE19711 from the NCBI Gene Expression Omnibus 
(http://www.ncbi.nlm.nih.gov/geo). 

For this data set, there are 27578 loci. After data qual- 
ity control process, we removed some subjects with BS 
values less than 4000 or the coverage rates less than 
95%. We also separate subjects into 6 age groups 



(50-55, 55-60, 60-65, 65-70, 70-75, and 75 and over). 
Table 3 gives the numbers of subjects in each age by 
treatment groups. For each locus, we perform the above 
mentioned approaches. 

Results 

Simulation results 

Table 1 reports the estimated type I error rates from 
each method under different conditions. For most of the 
time, the estimated type I error rates are close to the 
nominal significance level as expected. Table 2 gives the 
empirical powers from each method. It can be seen that 
the non-parametric method of Mood's median test 
usually has the lowest powers in the simulations. None 
of the ANOVA, Welch and KW tests is uniformly most 
powerful. In words, their performances depend on the 
distributions from which the data are generated. From 
our simulation study, the KW test is usually as powerful 
as or more powerful than the ANOVA test. The true 
distributions of the P -value may vary from locus 
to locus; it is impossible to simulate all possible 



Table 2 Empirical power at significance level 0.05 with 10000 replicates. 



Distribution (sample sizes, parameters) 


ANOVA 


median 


Welch 


KW 


Beta (5 = 30, 30,30, a = 5,5,5,b = 20,25,30 


0.821 


0.576 


0.810 


0.775 


Beta(s = 30, 30,30, a = 1.5,2,2.5, b = 20,20,20 


0.650 


0.504 


0.648 


0.710 


Beta (s = 30, 30,30, a = 20,20,20, b = 1.5,2,2.5, 


0.658 


0.495 


0.656 


0.713 


Beta (5 = 20,30,40, a = 5,5,5, b = 20,25,30) 


0.792 


0.546 


0.740 


0.735 


Beta (s = 20,30,40, a = 1.5,2,2.5, b = 20,20,20) 


0.599 


0.479 


0.634 


0.670 


Beta (s = 20,30,40, a = 20,20,20, b = 1.5,2,2.5) 


0.607 


0475 


0.637 


0.665 


TN (5 = 30, 30,30, ^ = 0.45,0.5,0.55, ct^ = 0.2) 


0.383 


0.240 


0.378 


0.362 


TN {5 = 30, 30,30, M = 0.45,0.5,0.55, = 0.1,0.2,0.3) 


0.338 


0.325 


0.412 


0.341 


TN (s = 20,30,40, M = 0.45,0.5,0.55, ct^ = 0.2) 


0.349 


0.238 


0.343 


0.328 


TN (s = 20,30,40, M = 0.45,0.5,0.55, = 0.1,0.2,0.3) 


0.219 


0.361 


0.423 


0.259 
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Table 3 Number of samples in age group by treatment group used In the paper after removing subjects with bs 
<4000 or coverage rate <95% or age >80. 


Age group 


control 


Pre-treat 


Post-treat 


Total 


50_55 


14 


15 


16 


45 


55_60 


61 


17 


25 


103 


60_65 


64 


17 


22 


103 


65_70 


35 


17 


21 


73 


70_75 


63 


24 


22 


109 


75_over 


20 


18 


9 


47 


Total 


257 


108 


115 


480 



distributions. However, based on the observation of the 
real data, we know that the distributions of the ji -value 
are far from the normal distribution, under which 
ANOVA is the best test. Therefore, we prefer nonpara- 
metric tests which are more robust. 

Results from real data set 

or the real data set, we applied the above mentioned 
methods to get the overall p-values (either using Fisher 
or Z test to combine p-values from individual age 
groups) for each locus. Then we use various cutoff p- 
values, 0.001, 0.0001, 0.00001, and 0.000001, to count 
how many loci have smaller p-values for each method. 
Table 4 reports the results. We can see that the KW 
method usually finds more significant loci than other 
methods. It also shows that the two combining p-value 
methods, Fisher and Z test have similar performances, 
although Z test usually give a little bit more significant 
loci expect for the Median test. Figure 1 plots the 
negative loglO p-values from pairs of the methods. It 
shows that the KW method gives smaller p-values 
especially when the differences among the three treat- 
ment groups are not large (e.g., the negative loglO p- 
values between 3 and 6). From Figure 1 we can see 
that for a given cutoff p-value, most of the loci identi- 
fied by ANOVA test or Median were also detected by 
the Welch test; in turn, most of the loci identified by 
Welch test were also detected by the KW test. This 
indicates the KW test is more powerful than other 
methods compared. 



Discussion and conclusions 

Due to the unique feature of the (3 -value of methylation 
data, traditional statistical methods, such as linear regres- 
sion and ANOVA test may not be appropriate. It has been 
shown that methylation is highly correlated with age; 
ignoring age effect may cause many false positives and/or 
false negatives. The effect of age may also not be linear; 
therefore we need a better way to account for this effect. 
In this paper, we use p-value combination method to deal 
with age effect. For each age group, we use nonparametric 
method to compare the treatment groups. It is important 
to find powerful and robust nonparametric methods for 
this sort of data. Although we found that KW method is 
more powerful than some other nonparametric methods 
for methylation data, it is desirable to find more powerful 
tests in this area. Furthermore, we want to point out that 
there are many other methods can be used to combine 
p-values [22,23]; it may also be possible to find a more 
powerful method to combine p-values for Illumina Array 
Methylation data. However, based on our experiences. 
Fisher test is more robust and can be used in situations 
when a small portion of the p-values are very small; while 
the Z test is more powerful when the effect sizes are simi- 
lar (e.g., the p-values don't differ much) for all of the age 
groups. Finally, although in this paper we use different 
cutoff p-values to compare the performance of tests, one 
may want to control the false positive rate. Several multi- 
ple comparison methods have been proposed for large 
scale data set to deal with the situations where the vari- 
ables (loci) are not independent [24-28]. However, it 



Table 4 Number of significant differentially methylated loci detected for given cutoff p-value based on the real data. 



Method 




1e-3 




1e-4 




1e-5 




1e-6 


Fisher 


Z-test 


Fisher 


Z-test 


Fisher 


Z-test 


Fisher 


Z-test 


ANOVA 


981 


1079 


655 


690 


479 


499 


350 


375 


Median 


906 


893 


464 


449 


255 


240 


143 


127 


Welch 


1096 


1106 


640 


673 


416 


424 


281 


289 


K-W 


1359 


1340 


823 


859 


551 


590 


381 


401 
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(c)KW-F vs. ANOVA-F (d)KW(Z vs.Fisher) 




0 5 10 15 20 0 5 10 15 

-loglO(ANOVA-F) -loglO(KW-Z) 

Figure 1 Negative loglO(p-value) from pair of methods. Negative loglO p-values from pair of methods, (a) Combined ANOVA test vs. 
combined median test both using Fislier methods to combine p-values. (b) Combined ANOVA test vs. combined Welch test both using Fisher 
methods to combine p-values. (c) Combined ANOVA test vs. combined Kruskal-Willas test both using Fisher methods to combine p-values. 
(d) Combined KW test using Z test vs. combined KW test using Fisher to combine p-values. 



remains to study which approach is more appropriate for 
the methylation data. 
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